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We propose an informal test for stationarity in a time 
series which checks for the compatibility of nonlinear approx- 
imations to the dynamics made in different segments of the 
sequence. The segments are compared directly, rather than 
via statistical parameters. The approach provides detailed 
information about episodes with similar dynamics during the 
measurement period. Thus physically relevant changes in the 
dynamics can be followed. 
PACS: 05.45. +b 

Almost all methods of time series analysis, traditional 
linear or nonlinear, must assume some kind of stationar- 
ity. Therefore, changes in the dynamics during the mea- 
surement period usually constitute an undesired com- 
plication of the analysis. There are however situations 
where such changes represent the most interesting struc- 
ture in the recording. For example, electroencephalo- 
graphic (EEG) recordings are often taken with the main 
purpose of identifying changes in the dynamical state of 
the brain. Such changes occur e.g. between different 
sleep stages, or between epileptic seizures and normal 
brain activity. In this letter we propose an approach 
to the study of potentially nonstationary signals which 
does not only provide a powerful test for stationarity but 
also allows for a time resolved study of the dynamical 
changes. While testing for stationarity might appear to 
be a technical problem of time series analysis, the analy- 
sis and understanding of nonstationary signals is a topic 
of current research in many areas of science. 

A number of statistical tests for stationarity in a time 
series have been proposed in the literature. Most of the 
tests we are aware of are based on ideas similar to the 
following: Estimate a certain parameter using different 
parts of the sequence. If the observed variations are 
found to be significant, that is, outside the expected sta- 
tistical fluctuations, the time series is regarded as nonsta- 
tionary. In many applications of linear (frequency based) 
time series analysis, stationarity has to be valid only up 
to the second moments ("weak stationarity"). Then, the 
obvious approach is to test for changes in second or- 
der quantities, like the mean, the variance, or the power 
spectrum. See e.g. 0] and references therein. Nonlinear 
statistics which can be used include higher order cor- 
relations, dimensions, Lyapunov exponents, and binned 
probability distributions. Q 

Stationarity can also be tested for without comparing 
running statistical parameters. One such test which is 
particularly useful in the context of correlation dimen- 



sion estimates is the space-time separation plot intro- 
duced in Ref . || . Also the recurrence plot of Rcf. 0] and 
the method proposed in |5j provide related information. 
However, these algorithms do not allow for a time re- 
solved study. Other material concerning nonstationarity 
in a nonlinear setup is found in Ref. || . 

In the following, a novel approach is taken which is 
based on the similarity between parts of the time series 
themselves, rather than the similarity of parameters de- 
rived from the time series by local averages. In partic- 
ular, the (nonlinear) cross-prediction error, that is, the 
predictability of one segment using another segment as 
a data base, will be evaluated. This concept is particu- 
larly useful if nonstationarity is given by changes of the 
shape of an attractor while dynamical invariants remain 
effectively unchanged. Other statistics which measure 
the similarity of time series can be used alternatively. 

Let {x n , n = 1,...,N} be a time series which is 
split the series into adjacent segments of length the 
i-th segment being called Sj = {xu_i)i+x, . . . ,xu}. Tra- 
ditionally, a statistic ji is now computed for each such 
segment. It is then tested if the sequence {7^} is constant 
up to statistical fluctuations. How this is done depends 
on what we know about the properties of the statistic 7, 
in particular its probability distribution. Alternatively, 
one can compare statistics computed on segments to val- 
ues obtained from the full sequence. Note that 7 is typ- 
ically a scalar but vectors like binned distributions can 
also be used. In this paper, we will take a different ap- 
proach and use statistics defined on pairs of segments, 
jij = j(S\, Sj), in particular the cross-prediction error. 

Statistical testing with nonlinear parameters 7 is dif- 
ficult because we can assume very little about the sta- 
tistical properties of 7. Estimators of dimensions and 
Lyapunov exponents do not usually follow normal distri- 
butions. Mean prediction errors are composed of many 
individual errors and thus more likely to be normal. How- 
ever, the empirical errors are not expected to be indepen- 
dent which complicates the estimation of the variance of 
7. By using statistics 7^ on pairs of segments we increase 
the number of parameters computed at a fixed number 
of segments from N/l to (N/l) 2 . It can be argued that 
we gain largely redundant information for the purpose of 
statistical testing since the 7y for different i,j are not 
expected to be independent. However, we will be able to 
detect different and more hidden kinds of nonstationar- 
ity. We can get a more detailed picture about the nature 
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of the changes and, in particular, locate segments of a 
nonstationary sequence which are similar enough for the 
purpose of our analysis and which can therefore be anal- 
ysed together. 

In principle, 7(Sj,Sj) can be any quantity which is 
sensitive to differences in the dynamics in Si resp. Sj. 
Examples of such quantities can be found in Refs. R] 
and [pj . For the application we have in mind, theoretical 
rigor in the definition of 7 is less important than robust- 
ness and the possibility to obtain a stable estimate on 
rather short segments Si,Sj. One statistic which meets 
these criteria is the error of a simple nonlinear prediction 
algorithm. Predictions with locally constant approxima- 
tions yield stable results for sequences of a few hundred 
points or less. Global nonlinear predictions can be per- 
formed with even less points, provided the global ansatz 
is chosen properly. Here we want to avoid the latter non- 
trivial requirement. More attractive from the theoretical 
point of view is the crosscorrelation integral defined in 
Rcf. M. However, it requires longer segments and a low 
noise level in order to obtain stable results without man- 
ual evaluation of scaling plots. 

Let us again stress that the main point of this let- 
ter is to exploit the information contained in the rela- 
tive statistics j(Si,Sj), in addition to that contained in 
the diagonal terms "/(Si, Si). Many nonlinear statistics 
can be naturally generalized to relative quantities. We 
mentioned cross-prediction errors and the crosscorrela- 
tion integral. Lyapunov exponents might be generalized 
by measuring the divergence of pairs of trajectories, one 
taken from Si, one from Sj. 

Let us define the cross-prediction error 7y we will use 
as a statistic to compare segments. It is computed as 
follows. Let X = {x n , n = 1, Nx} and Y = {y n , n = 
1, Ny} be two time series and to be a small integer de- 
noting an embedding dimension. From both time series 
we can form embedding vectors {x n , n = to, Nx — 1} 
resp. {y n , n — m,iVy — 1} in the same to dimen- 
sional phase space, where x n = (x n - m +\, ■ ■ . , x n ). Fur- 
ther let us fix a length scale e. For each y n we want to 
make a prediction one step into the future, that is, given 
Vn = (Vn-m+i, ■■■lUn) we want to estimate y n +i, using 
however X as a data base. A locally constant approxi- 
mation to the dynamics relating x n and x n+ i yields the 
estimate 
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In the above formula, U*(y n ) = {x n > : ||dv — Vn\\ < e } 
is an e-neighborhood of y n , formed however within the 
set X. \U*(y n )\ denotes the number of elements in that 
neighborhood. For isolated points with empty neighbor- 
hoods we take the sample mean of the segment X as 
an estimate y^+i- This or similar schemes are widely 
used for prediction and noise reduction. Our formulation 



however leaves room for the possibility that the approxi- 
mation to the dynamics is performed on a different data 
set X than the actual prediction. If we take X and Y 
to be the same but exclude from Uj (y n ) all 2m — 1 vec- 
tors which share components with y n , then y^f +1 is an 
ordinary out-of-sample prediction of y n +i- P] The root 
mean squared prediction error j(X, Y) of the sequence 
Y, given X, is defined by 
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For X = Y, this is the usual take-one-out, out-of- 
sample prediction error. j(X,Y) probes in how far the 
locally constant approximation to the dynamics of X is 
suitable to predict values in Y. For a stationary time 
series, we expect that "/(Sj, Sj) is independent of i and j 
unless the coherence time of the process is longer than I. 
If there is variability in the sequence on time scales longer 
than Z, be it due to a slow variable or due to a changing 
parameter, the diagonal terms 7(Sj, S\) will be typically 
smaller than those with i ^ j. 

Note that in general j(X, Y) ^ j(Y, X). In particular, 
if the attractor of Y is embedded within the attractor 
of X, for example if Y forms a periodic orbit which is 
present as an unstable orbit in X, points in Y can be 
well predicted using X as a data base. However, Y does 
not contain enough information to predict all points in 
X. While the asymmetry of 7^ can provide valuable 
insights, it may also be confusing in some cases. One can 
then use a symmetrized statistic like 7^ + 7^. 

Let us illustrate the method with a numerical example, 
a generalization of the well known "baker map" 

it v n < a : , 
v n +i = v n /a 

u n+ i = 0.5 + j3u n 



if v n > a 



v n +i = (v n - a) /(I - a) , 



defined for v n £ [0,1], a, (3 s]0, 1[. For this piecewise 
linear mapping, the two Lyapunov exponents can be com- 
puted analytically (see e.g. p0| ): 

Ai = a log — + (1 - a) log — — — 

a 1 — a 

A 2 = log /3 . 

By varying [3 only, we can create sequences with differ- 
ent dynamics but with the same maximal Lyapunov ex- 
ponent. Indeed, we will generate a nonstationary time 
series by varying fj slowly with time: f3 = n/N . We keep 
a = 0.4 fixed and measure N — 40000 points by record- 
ing u + v. From this we subtract the running mean and 
we normalize to unit running variance. The actual time 
series is thus: 
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w n =u n + v n , 



2 




10 15 20 25 30 35 40 
segment number 

FIG. 1. "Diagonal" cross-prediction error for a nonsta- 

tionary sequence of the generalized baker map with a — 0.4 
and (3 = n/N. The total signal of N = 40000 is split into 40 
segments of length 1000. 
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FIG. 2. Mutual predictions between sections of length 1000 
for the baker map time series used in Fig. 



where (•)& denotes the average over indices n! = n — 
k, . . . ,n + k. Here k — 50. The nonstationarity in the 
sequence is very hard to detect since many observables 
remain unchanged. The running mean and variance are 
constant up to finite sample fluctuations, autocorrela- 
tions show only very small variation, finite time esti- 
mates of the largest Lyapunov exponent essentially do 
not change. Figure ^ shows the nonlinear prediction er- 
ror for 40 segments of length 1000 each. We used an 
rn = 2 dimensional embedding and neighbourhoods of 
radius e = 0.25. Only towards the end of the sequence 
one could suspect that something is changing. 

The parameter drift is however revealed by Cross- 
predictions using one segment S| of length I = 1000 as a 
data base to predict values within another segment Sj, 
as can be seen in Fig. ||. Prediction errors are encoded 
as gray scales. Black is used for 7y < 0.3, white for 
7ij > 0.8, and linear shading in between. Predictabil- 
ity degrades rapidly with the temporal distance of the 
segments. 

As a realistic example, we study a recording of the 
breath rate of a human patient during almost a whole 
night (about 5 h), measured twice a second. The data 
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FIG. 3. Prediction error 7^ for segments of a Ion. 

stationary recording of the breath rate of a human. 11 
set was split into 40 segments of 850 points (425 sees 
Errors are normalized to the error of the best AR(1) model. 
Considerable fluctuations are present but there is no indica- 
tion of a qualitative difference between the first and the second 
half of the recording. 



is part of data set B from the Santa Fe Institute time 
series contest held in 1992. It is described in Ref. Jll| . 
Obviously, conditions cannot be assumed to be constant 
during a night's sleep. Changes of the calibration and the 
instantaneous variance, as well as the linear autocorrela- 
tions are easy to detect by standard methods. In order 
to emphasize that the algorithm is sensitive to changes 
in the nonlinear structure, we subtract from the data the 
running mean and divide it by the running root mean 
squared amplitude. Further, all prediction errors are nor- 
malized to the error of the best linear AR(1) model. 

In order to detect nontrivial changes, we split the 
recording into 40 nonoverlapping segments Si of 850 
points (425 sec.) each. Cross-predictions are performed 
using m = 2, and e is chosen to be 0.25 (at unit rms 
amplitude). In Fig. |^ we show the (auto-) prediction 
error ja as a function of segment number. There are 
some fluctuations, most prominent are the lower errors 
for segments 15-18. The cross-prediction error is shown 
in Fig. U as grey shades. Black means 7 < 0.9, white 
7 > 1.3, linear grey shades are used in between. Except 
from the lower errors for i — j (see note S), we see that 
there is a transition around one third of the recording: 
segments up to about 15 are less useful for predictions of 
segments after about 20 and vice versa. That segments 
15-18 are different was aparent from Fig. ^ already. For 
this data set, most nonlinear tests are able to detect that 
nonstationarity is present. The main advantage of the 
present method is that it provides more detailed, time 
resolved information than just a statement that nonsta- 
tionarity has been found. 

The algorithm as described here contains a few param- 
eters which have to be chosen appropriately for each data 
set. The embedding dimension m and neighborhood size 
e should yield good overall predictions. The segment size 
is determined by the tradeoff between statistical stabil- 
ity of 7^ for long segments and finer time resolution for 
short segments. A slight advantage may be gained by 
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FIG. 4. Cross-prediction errors for segments of a recording 
of the breath rate of a human. The figure shows that there is 
a qualitative change in the dynamics around segment 15. 

the use of overlapping segments. Other relative statistics 
than cross-predictions may be used and the table of the 
jij may be interpreted by other means than grey scale 
plotting. In particular, ongoing research is devoted to 
the evaluation of 7^ in terms of cluster analysis. 

In a nonlinear setting, for instance if it is planned to 
apply algorithms from the theory of deterministic chaos 
to a time series, weak stationarity (constant second mo- 
ments) is certainly not enough. Let us further remark 
that the widespread notion that the system which pro- 
duces the time series must remain unchanged during the 
time of measurement is neither a necessary nor a suffi- 
cient condition for stationarity. The reason is that there 
is no a posteriori distinction between a system parameter 
(to remain constant) and a variable (which may evolve 
in time). Thus a system with a rapidly fluctuating pa- 
rameter may yield a stationary time series because these 
fluctuations can be averaged over, while a system with 
constant parameters can produce signals which for all 
practical work must be considered nonstationary. An ex- 
ample for the latter case is given by intermittency where 
the time evolution of some variables may become arbi- 
trarily slow. For processes, stationarity can be defined by 
requiring that the joint probability distribution remains 
constant. Given a finite time series, this probability dis- 
tribution can only be estimated up to statistical fluctu- 
ations. It is problematic to define stationarity on the 
base of such an estimate. In this paper we have taken 
a rather pragmatic point of view and call a signal sta- 
tionary if anything which changes in time (no matter if 
we call it a variable or a parameter) does so on a time 
scale such that the changes average out over times much 
smaller than the duration of the measurement. 

We were able to detect changes in the dynamics of a 
system even if scalar statistics do not change significantly. 
The proposed method is meant to augment known tests 
for stationarity, in particular since it includes the possi- 



bility to find interrelations and similarities between dif- 
ferent parts of a time series. 
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